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Abstract 

' Use of an error correction code in a given transmission channel can be regarded as the statistical experi- 

O ' 

■ ment. Therefore, powerful results from the theory of comparison of experiments can be applied to compare the 

performances of different error correction codes. We present results on the comparison of block error correction 
' codes using the representation of error correction code as a linear experiment. In this case the code comparison is 
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based on the Loewner matrix ordering of respective code matrices. Next, we demonstrate the bit-error rate code 
performance comparison based on the representation of the codes as dichotomies, in which case the comparison 
is based on the matrix majorization ordering of their respective equivalent code matrices. 

Index Terms 



' Deficiency distance, error-correction code, code design, probability of error, Bayes risk, zonotope, matrix 



majorization. 



I. Introduction 



O ■ In this section we will review the basic concepts of statistical experiments that will later be used to 



establish ordering relation between error correction codes employed in a given communication channel. 



d Following [1], a statistical experiment is defined as a pair £ = {X, Pg; ^ G 0) where X is a measurable 
sample space, Pe is a probability measure on X for each 6* G 6 and 6 is a parameter set. Let the decision 
rule y = a{x) of experiment 8 be defined as a mapping from an element of the sample space x E X io 
the element of the decision space y E y. Further, one can define the loss function C{9,y) of choosing 
the decision y E y when the true state of the parameter is 6' G 0. The reader is referred to [1] for 
a thorough treatment of statistical experiments, loss functions, with numerous illustrative examples. We 
will next introduce the notation used throughout the paper, followed by the definition and properties of 
deficiency distance between experiments. 
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Notation: We will use a' to denote the transpose of vector a, denotes the Moore-Penrose pseudoin- 
verse of the matrix A, denotes the cardinality of set S. A A B and A\/ B denote the element-wise 
minimum and maximum of two equal dimensional matrices A and B, respectively. Let A(i, :) and A{:, i) 
denote the i-th row and i-th column of the matrix A, respectively. Range range{B) of the matrix B is the 
space spanned by the columns of the matrix B. Let be [0, 0, . . . , 0, 1, 0, . . . , 0] where 1 is at the z-th 
position of vector Cj. Let conv(a, 6) = {aa + (1 — a)h\a E [0, 1]} for equal dimensional vectors a and 
b. Let J^nm be the set of row-stochastic n x m dimensional matrices, i.e. these matrices have positive 
elements whose rows sum to 1. Let indicator function /{a;} be equal to 1 if x is true and otherwise. 
A^(/i, S) denotes the multivariate normal random variable with mean /i and covariance S. ||/|| denotes the 
total variation of function f(x) given by J \f{x)\dx. E„ is the expectation with respect to (w.r.t) random 
variable u. 

Le Cam has introduced in [2] the following definition of the deficiency distance between two experi- 
ments: 

Definition 1: (Deficiency Distance) Experiment £ = {X,Pe;9 E 6) is e-deficient w.r.t experiment 
^ = Qe'j E Q) if for each 6o C 6, and each decision rule p of experiment JF, there exists a 
decision rule a of experiment £ such that 

j Pe{x)C{d,(y{x))dx< j Qe{y)C{e, p{y))dy + €em^^C{e,y) (1) 

when 6 E ©o- In short form, the deficiency distance is denoted as e = 6{£,J-'). 

The deficiency distance can plausibly be interpreted as follows: the deficiency of experiment £ w.r.t 
the experiment JF is the upper bound on the difference in the risk function^ between experiments £ and 
JF for any choice of the a priori knowledge of the unknown parameter 9 and some decision rule on £. In 
general, therefore, the deficiency distance is not symmetric, i.e. S{£,J-') ^ 6{J-',£). 

Historically, the deficiency distance is a generalization of the older concept of sufficiency ordering of 
experiments. 

Definition 2: (Sufficient Experiments [3], [1]) Experiment £ = (X; Pg, 6 E Q) is sufficient for exper- 
iment JF = (y;Qg,9 E 6) iff 5{£,T) = 0. The ordering of experiments based on sufficiency is also 
denoted by £^ >5' JF. 

'Expression J Pg{x)C{6,a{x))dx is commonly referred as the risk function of the experiment £ — {X,Pg;6 £ 0) for a given loss 
function L and decision rule a. 



Note that ordering >5 is a partial ordering on the set of experiments with the same parameter set. This 
ordering is sometimes also referred to as Blackwell ordering. We next discuss the basic properties of the 
deficiency distance. 

Theorem 1: (Properties of Deficiency Distance) Let E^T^Q be experiments. Then 

(i) < 8{E,T) < 2-2(#0)-i 

(ii) ^{E.Q) < 6i£,J^) + 5iJ^,g) 
iii) S{S,S) = 

Proof: Theorem 6.2.24 in [1]. □ 

A special case of a statistical experiment is a linear experiment defined belowcl: 

Definition 3: (Linear Normal Experiment) Linear normal experiment S is denoted as S{A, S; /3 G TZ^) 

and is experiment with the sample distribution N{A' (3, S) where the parameter is /3 G IZ^ and the 

parameter set is 7?.". Real valued matrices A and S are assumed to be known. 
Two linear experiments can be compared as follows. 

Theorem 2: (Linear Experiment Comparison) (a) Linear normal experiment S{A, T,; (3 G TZ") is suffi- 
cient for J^{B, T;Pe 7^") (i.e. 6{S, J^) = 0) if 

AT,A' — BVB' is non-negative definite matrix. (2) 

(b) Let S = J and T = I then 5{£,J-') < 2 iff range{B) C range{A). Furthermore, if range{B) C 
range{A) then 

5(f = ||iV(0, {B'iAAYB) V /) - iV(0,/)||. (3) 
Proof: Theorem 8.2.13 and 8.5.7 in [1]. □ 

General statistical results and inequalities can provide important insights into the operation and efficient 
design of a communication system. An early result on the influence of the communication channels based 
on Blackwell ordering of channel transfer matrices has been presented [4]. More recently, monotonicity 
results on the influence of Rician fading on the capacity of MIMO systems has been demonstrated in [5]. 
Based on general inequalites of stochastic majorization, performance comparison of various receivers in 
multipath fading channels and the influence of power delay profile has been shown in [6]. 

Starting with the general notion of deficiency, we will present in the next section the application of 
these concepts to the comparison of error correction codes. Namely, in Subsection IH-AI we present the 

more general definition of linear experiment may be given as in [1], Section 8. 



block error correction code comparison in additive white Gaussian Noise (AWGN) channels based on 
linear experiments and block-error rates. In Subsection III-BI we present the bit-error rate comparison of 
error-correction codes used in discrete channels based on matrix majorization. 



II. Error Correction Block Code Comparison 



Let (M, n) block error correction code denoted with C be defined as a map (pc '■ i ^ Xi from the set 
z G = {1, . . . , M} of possible information messages to the set of codewords Xi E A"' for i = 1, . . . , M. 
Let A denote the alphabet of the code symbols. 

Let the received message be z if the transmitted code word is x = (pciv) as a result of encoding the 
information message y. The dependence of received message 2; on x is described with a probabilistic law, 
examples of which will be discussed in more detail in the following sections. 

The original transmitted message y is recovered through the usage of decoder 6 which maps the received 
message z to a possible transmitted data message y 



In the information theory as well as communications practice, a common metric used to evaluate the 
performance of the block error correction code is the packet (code-word) error probability. Let the 
transmitted message be y. Then, the packet error probability of code C is equal to 



where expectation is over realizations of the random variable z of the received code word given transmitted 
data message y. 

A. Comparison of Error Correction Codes in AWGN Channels Based on Linear Normal Experiments 

In this section we will address the transmission of the coded message over an additive white Gaussian 
noise channel. Namely, assume that code word x E is being transmitted and that the alphabet of 
the code symbols is IZ. The code words are assumed to be energy bounded such that xx' < E where 
parameter E is the upper bound on the energy of the code word. The received data is given as 



6{z) y. 



(4) 



(5) 



z = X + P 



(6) 
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TABLE I 

Equivalent terminology used in statistical experiments and error-correction codes 



Statistical Experiments 


Error-Correction Codes 


parameter 9 


information message 


decision rule 


decoding algorithm 


sample space 


set of received code words 


loss function 


rate-distortion measure 



where u is the additive Gaussian noise with covariance matrix i?[z/z/'] = a^I. The signal to noise ratio is 
therefore upper bounded with 

For transmission in additive white Gaussian noise channel, a block error correction code can be 
interpreted as a linear experiment £{A, la^; (3 E IZ^^) such that the parameter set is 6 = 'R}' and matrix 
is ^4 = [xiX2 ■ ■ ■ xm]' ■ The z-th information message is represented with the parameter (3 = ei,i = 1, ...M. 
Matrix A that uniquely describes such a block code will be called the code matrix and the adjoint block 
code will be denoted in short with A. In the context of the comparison of experiments, decoder 6 can be 
interpreted as the decision rule, while I^s{z)^i3} is the loss function, for a received message z and transmitted 
data message (3. A comparison of error-correction code and statistical experiments terminology is shown 
in Table H for easy reference. 

Proposition 1 : {Comparison of two codes based on the packet error probability) Let A and B be two 
code matrices that define block error correction codes A and B respectively, (a) If noise variance cr^ is a 
known parameter and if 

AA' — BB' is non-negative definite matrix (7) 

then there exists a decoder of the code A that will always have smaller bit error rate than any decoder of 
the code B 

< (8) 

for any transmitted data message = Cj, i G 1, . . . , M. 

(b) If the above condition is not satisfied and if range{B) C range{A) then for any data message (3 

PfiP) - Pf < 5{AB) = ||iV(0, {B\AAYB) V /) - iV(0,/)|| (9) 

where Pf{l3) and P^iP) are packet error rates of block codes A and B respectively. 
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Proof: The proof follows from the definition of the deficiency distance and Theorem [2l Let 60 = 
{ei\i = 1,2,..., M} and the decision space be 3^ = {1, 2, ... , M}, while the loss function be defined as 



Expected value of the above loss function corresponds to the risk function that is equal to the block error 
probability when averaged over random noise realizations, i.e, 



According to the Definition [2l the statement (a) of this proposition immediately follows from Theorem [2ta). 
Similarly, (b) directly follows from Theorem [2tb) where the Definition [T] is applied for the loss function 
in (fTOl) and by noting that max^ L{6, y) = !.□ 

Remark 1: (Influence of the Information Message Distribution) Note that the conclusions of the Propo- 
sition[T]are valid for any data message = Cj, i = 1, . . . , M. Therefore, it follows that the code comparison 
results of Proposition [H are valid for any a priori distribution of data messages. 

Remark 2: (Loewner Order and Moment Generating Matrices) Matrix partial order A > B whenever 
AA' — BB' is non-negative definite is commonly referred to as Loewner order. In the experiment design 
literature matrix AA' is commonly referred as the moment matrix of a linear experiment 8{A, S; /? G 7?."). 

Remark 3: (Bayes risk) The conclusions of the Proposition [T] are valid for the comparison of two 
codes with respect to any loss function and not just the packet-error probability. Namely, any Bayesian 
risk function can be used and the conclusions of the Proposition still hold. 

Remark 4: (Limitations of the code comparison based on linear experiments) The code comparison 
based on Loewner order is very general and strong. There are two reasons for this statement: (i) with 
respect to the block code, parameter set of the adjoint linear experiment is extended from a finite set to 
the n-dimension space 7^", and (ii) the comparison is valid for any risk function and may encompass risk 
function that may not be of interest in the code design and performance analysis. Therefore, for some 
applications the packet error rate bound based on the deficiency bound may be loose. 

Next, we give an example of the code comparison based on the linear experiment comparison The 
codes compared have the same parameter set. 

Example 1: (a) BCH(63,7) code is better than BCH(15,7) code for any loss function, (b) However, 
code comparison between BCH(31,7) and BCH(15,7) cannot be established based on Proposition [T] These 



c{e,y) = i{y=^maxie,},d = [01, ...,OM]&Qo,y^y- 



(10) 




(11) 
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conclusions are easily established by calculating their respective code matrices and checking if Loewner 
ordering holds. 

B. Comparison of Error Correction Codes in Discrete Channels: Matrix Majorization and Zonotopes 

In this section we assume that the code symbol alphabet A is finite with / elements. Further, we assume 
that channel outputs also belong to the finite alphabet of w symbols, i.e. that the channel is discrete. In 
light of the experiment comparison terminology, therefore w"" possible received noisy code-words are 
elements of the sample space X, while M = 2^ information messages (of length k bits) are elements of 
the parameter set 0. 

A finite code symbol alphabet block error correction code S can be represented with a row-stochastic 
2^ X code matrix M^. The code word corresponding to the information message i is the element of 
the set Jl". Row i of corresponds to the information message « G 1, . . . , 2'^ and is equal to e^i^ii), i.e. 
it is equal to the unity vector with 1 in 0£;(i)-th place. As will be demonstrated below using Theorem [3l 
the ordering of information and code messages is arbitrary and our proceeding results do not depend on 
this ordering. 

Channel C is modeled as follows. Since the number of observations is considered to be finite, the channel 
is modeled with a row stochastic x w"^ matrix C = [pij] E A4in^u. Probability pij is the probability of 
receiving i-th element of the sample space X if code word j E is being sent. This general channel 
description incorporates binary symmetric channels (BSC) (as discussed in the Example O as well as 
many other channels such as bursty error channels. 

Following the naming conventions introduced in Section I, coding experiment will be considered as 
the observation of the parameter (or message) 6* G 6 in channel C after coding the message with the 
block code E. Therefore, the probability measure Pq of such experiment on the finite sample space X is 
given with the rows of the transfer matrix M^C. Using the theory of experiment comparison, performance 
comparison of two block codes £ and JF in the same channel C can be based on their transfer matrices 
MgC and M^C. 

Let us consider the channel C. Then, code £ is sufficient for code JF if M^CM = MjrC, for some 
w"' X dimensional row-stochastic matrix M [1]. This condition is equivalent to the matrix majorization 
[7] of matrix M^C with respect to MjrC and is also denoted with MgC >- MjrC. Therefore, for any loss 
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function and decoder associated with decoding of the code JF, there exists a decoder for the code 8 that 
produces less or equal risk. In addition, under the above majorization condition, code 8 can have smaller 
code word error probability than the code JF. 

Several properties of the matrix majorization are shown in [7]. This article also demonstrates that it is 
possible to check the matrix majorization ordering between two matrices by checking the feasibility of a 
linear program. 

It is stated that the linear operator T : M.nm — ^ -M-nm preserves the matrix majorization ordering if 
B ^ T{A) y T{B) for some A,B e Mnm- 

Theorem 3: (Preservation of the Matrix Majorization Ordering [8]) A linear operator T : Mnm Mnm 
preserves the matrix majorization ordering if and only if T{X) = LXP, where L E M„„ is an invertible 
matrix and P G Mmm is a permutation matrix. 

An interpretation of the Theorem [3] in the context of the error correction code comparison is as follows. 
Since, matrix majorization is preserved if matrices are multiplied from the right by any permutation matrix 
P, it is obvious that code comparison ordering is preserved for any permutation of the received code words 
in the transmission channel. Also, let us first consider the case when invertible matrix L in the above 
theorem is also a permutation matrix. Then, it follows directly from Theorem [3] that code comparison 
ordering is preserved for any permutation of the input data messages. The general case of any invertible 
matrix L is not of interest as it would amount to randomization of data messages at the input of the 
encoder. 

Due to the large dimensions of the matrices M^C and MjrC it might not computationally be simple to 
check if matrix majorization ordering >- can be established between these two transfer matrices. Therefore, 
to simplify the setting and to provide more insight into the matrix majorization ordering we will be 
considering next the special case of detecting a single bit of the information message 9 consisting of k 
bits. 

To accomplish that, we first have to introduce our following assumption: 
Al: The a priori probability distribution of data messages p(9) is known. 

This assumption is warranted in practical systems. For example, source coding is usually used prior to 
error correction coding which renders information messages uniformly distributed. 

Let us concentrate on the decoding of the r-th bit of the /c-bit long information message 9 = [bib2 ■ ■ ■ K ■ ■ ■ bk]. 
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(12) 



Therefore bit br is the parameter of the experiment and the parameter set is © = {0, 1}. All other bits in 
the information message are to be considered to be nuisance parameters. In the long term, the effect of 
the nuisance parameters can be averaged out by introducing the equivalent code matrix as follows. 
Let ©''(0) — {9 — [6i&2 ■ • - br • • - bkWbr — 0} be the set of information messages for which the r-th bit is 
equal to 0. Similarly, let ©' (1) = {6 = [6162 ■ ■ - br ■ ■ - b^Wbr = 1} be the set of information messages for 
which the r-th bit is equal to 1. 

The equivalent 2 x transfer matrix for r-th bit will be defined as 

The coefficients of this equivalent transfer matrix will be denoted with e^j,; j = 1, 2; A; = 1, . . . , w". 

By introducing the equivalent experiment for each of the bits of the information message 9, we can 
represent the block code £^ as a sequence of k experiments with parameter set of just two elements. In the 
statistical literature the experiment with the parameter set of cardinality 2 is usually called a dichotomy. 
Dichotomy will be denoted with V = {D, 9 e {0, 1}}, where D is a 2-row stochastic transfer matrix. In 
the case of dichotomies, the matrix majorization has several very useful simplifying properties. However, 
to be able to use these properties, we first have to introduce the concepts of zonotopes. 

Definition 4: (Dichotomy Zonotopes [7]) Consider a dichotomy Da = {A, 9 G {0, 1}} where A = {aij} 
is a 2 X n dimensional row stochastic matrix. The zonotope of the dichotomy Va is defined as 

Z{Va)= Y1 conv(0,A(:,z)) (13) 

i€{l,...,n} 

where the addition © in the previous equation is the Minkowski addition. 

Recall that Minkowski addition © of two sets is the set of sums of all possible combination of elements 
from these two sets, i.e. 

A®B = {a + b\aeA,beB} (14) 

Dahl [7] showed that the dichotomy zonotope is a polygone that contains the origin and is symmetric 
with respect to the point (|, |). The upper boundary of the dichotomy zonotope can be calculated as 



/3a{x) = msx{y\{x,y)eZ{A)} (15) 
y 

n n 

— max{^^ ^2i^il f^ij^i < 2;, < 5j < 1} (16) 



for X e [0, 1]. Now, two dichotomies can be compared as follows: 

Theorem 4: {Comparison of Dichotomies [7]) Let A = {A,0 G {0, 1}} and B = {B,9 G {0, 1}} be 
two dichotomies with the same parameter set. Then, the following properties are equivalent: 

(a) A<sB 
{h) A-<B 

(b) Z{£) C Z{T) 

(c) /3^(a;) </3s(a:),xe [0,1]. 
Proof: Corollary 4.2 in [7]. □ 

Now, using the per-bit equivalent matrix representation of the block code experiment we can state the 
following corollary regarding the bit-error probabilities of the decoding of a particular bit of a code word. 

Corollary 1: {Per-bit Code Comparison) Let £ and T be two block error correction codes used in 
channels Ci and C2 with equivalent transfer matrices — {ej^} and Mj^ — {/Jj.}. respectively. Then, 
the probability of decoding the r-th bit of the information message of code £ can always be less than 
that of decoding code T if 

n n 

max {J2 eljSjl J2 ^ij^j < a;, < 5^ < 1} > (17) 
j=i 3=1 

n n 

max {5^4.5,1 ^/r/, <x,0< 5,-. <1} (18) 

j=i i=i 

We next illustrate the shape and certain properties of the zonotope of a block error correction code. 

Example 2: {BCH Code Comparison using Zonotopes) Consider the use of Hamming(15,7) code £ in 
a binary symmetric channel (BSC) with probability of error p. In Figure 1, we show zonotopes for Zi{£) 
and Z2{£) for the use of this code in BSC with probabilities of error pi = 0.1 and p2 — 0.2, for the first 
information bit. It is obvious that the Zi{£) D Z2{£) and that code £ will be better performing in the 
channel with error probability pi, than in the channel with error probability p2. 

Remark 5: The error-correction comparison based on dichotomies and bit-error rates is more specific 
than the comparison based on the comparison of the linear experiments. This is due to the fact that 
comparison based on dichotomies is using more information about the structure of the parameter set to 
be used to convey information. As opposed to that the parameter set in the case of linear experiments is 
the set TZ'^ even if the true set of information messages is finite set © = {0, ... , M}. 
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Zonotopes of BCH(15,7) code in BSC channel with p = 0.1 and p = 0.2 




Fig. 1. Illustration of the zonotopes of the BCH(15,7) codes in the BSC channel with p = 0.2 and p — 0.1. 

III. Extensions and Conclusion 

In a similar manner one can compare the performance of the spreading codes in CDMA systems or 
compare performances of two distinct Multiple-Input Multiple-Output (MIMO) channels. For example, 
consider a MIMO channel 

y = Ax + n (19) 

where x is a t-dimensional column vector of the transmitted message, y is a r-dimensional column vector 
of the received message, Ais tx r-dimensional channel matrix, and n is the additive Gaussian noise with 
covariance matrix 1^.. 

It is obvious that results of the Theorem [21 can be directly applied and the channels can be compared 
using the concepts of Loewner ordering and deficiency. For example, MIMO channel defined with channel 
matrix A is better than MIMO channel defined with channel matrix B iff AA' — BB' is non-negative 
definite. The criterion for performance comparison can be any loss function for the estimation of the 
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uncoded transmitted signal x. Results with a similar flavor have been obtained in [5] for the comparison 
of two Rician MIMO channels but without the use of powerful theory of the comparison of experiments. 

In conclusion, let us mention that the elegant statistical results on the comparison of experiments and 
deficiencies [3], [1], [2] have been known for several decades. However, to the best knowledge of the 
author, despite their appeal to the problems of information transfer these results have received limited 
attention in the information and communication theory before. 
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